Design alternatives for shared memory multiprocessors
نویسندگان
چکیده
In this paper, we consider the design alternatives available for building the next generation DSM machine (e.g., the choice of memory architecture, network technology, and amount and location of per-node remote data cache). To investigate this design space, we have simulated five applications on a wide variety of possible DSM architectures that employ significantly different caching techniques. We also examine the impact of using a special-purpose system interconnect designed specifically to support low latency DSM operation versus using a powerful off the shelf system interconnect. We found that two architectures have the best combination of good average performance and reasonable worst case performance: CC-NUMA employing a moderate-sized DRAM remote access cache (RAC) and a hybrid CC-NUMA/S-COMA architecture called AS-COMA or adaptive S-COMA. Both pure CC-NUMA and pure SCOMA have serious performance problems for some applications, while CC-NUMA employing an SRAM RAC does not perform as well as the two architectures that employ larger DRAM caches. The paper concludes with several recommendations to designers of next-generation DSM machines, complete with a discussion of the issues that led to each recommendation so that designers can decide which ones are relevant to them given changes in technology and corporate priorities. *Mark Swanson is now at Intel Corporation. Current email addresses: [email protected] This research was supported in part by the Space and Naval Warfare Systems Command (SPAWAR) and the Advanced Research Projects Agency (ARPA), under SPAWAR contract No.#N0039-95-C-0018 and ARPA Order No.#B990. The views and conclusions contained herein are those of the authors and should not be interpreted as necessariy representing the official policies or endorsements, either expressed or implied, of DARPA, the Air Force Research Laboratory, or the US Government.
منابع مشابه
Execution-Driven Simulation of Shared-Memory Multiprocessors
This paper describes an eecient execution-driven technique for the simulation of shared-memory multiprocessors driven by real programs. Our simulator ooers substantial advantages in terms of reduced time and space overheads when compared to instruction-driven or trace-driven simulation techniques, without signiicant loss of accuracy. The technique produces correctly interleaved address traces a...
متن کاملModeling and Performance Evaluation of Multi-Processors Organization with Shared Memories
This paper is primarily concerned with theoretical evaluation of the performance of multiprocessors system. A markovian waiting line model has been developed for various different multi-processors configurations, with shared memory. The system is analysed at the request level rather than job level.
متن کاملSoftware Caching on Cache-Coherent Multiprocessors
Programmers have always been concerned with data distribution and remote memory access costs on shared-memory multiprocessors that lack coherent caches, like the BBN Butterry. Recently memory latency has become an important issue on cache-coherent multiprocessors, where dramatic improvements in microprocessor performance have increased the relative cost of cache misses and coherency transaction...
متن کاملClassifying Software-Based Cache Coherence Solutions
The authors propose a classification for software solutions to cache coherence in shared memory multiprocessors and show how it can be applied to more completely understand existing approaches and explore possible alternatives.
متن کاملEvaluation of Design Alternatives for a Directory-Based Cache Coherence Protocol in Shared-Memory Multiprocessors
In shared-memory multiprocessors, caches are attached to the processors in order to reduce the memory access latency. To keep the memory consistent, a cache coherence protocol is needed. A well known approach is to record which caches have copies of a memory block in a directory and only notify the caches having a copy when a processor modifies the block. Such a protocol is called a directory-b...
متن کاملA Study on the Impact of Memory Consistency Models on Parallel Algorithms for Shared-Memory Multiprocessors
Memory consistency model is an integral part of the shared-memory multiprocessor system, and directly affects the performance. Most current multiprocessors adopt relaxed consistency models in quest of higher performance. In this paper we study the impact of memory consistency model on the design, implementation and performance of parallel algorithms for graph problems that remain challenging du...
متن کامل